Class Prior Estimation from Positive and Unlabeled Data
نویسندگان
چکیده
We consider the problem of learning a classifier using only positive and unlabeled samples. In this setting, it is known that a classifier can be successfully learned if the class prior is available. However, in practice, the class prior is unknown and thus must be estimated from data. In this paper, we propose a new method to estimate the class prior by partially matching the class-conditional density of the positive class to the input density. By performing this partial matching in terms of the Pearson divergence, which we estimate directly without density estimation via lower-bound maximization, we can obtain an analytical estimator of the class prior. We further show that an existing class prior estimation method can also be interpreted as performing partial matching under the Pearson divergence, but in an indirect manner. The superiority of our direct class prior estimation method is illustrated on several benchmark datasets.
منابع مشابه
Estimating the class prior and posterior from noisy positives and unlabeled data
We develop a classification algorithm for estimating posterior distributions from positive-unlabeled data, that is robust to noise in the positive labels and effective for high-dimensional data. In recent years, several algorithms have been proposed to learn from positive-unlabeled data; however, many of these contributions remain theoretical, performing poorly on real high-dimensional data tha...
متن کاملAnalysis of Learning from Positive and Unlabeled Data
Learning a classifier from positive and unlabeled data is an important class of classification problems that are conceivable in many practical applications. In this paper, we first show that this problem can be solved by cost-sensitive learning between positive and unlabeled data. We then show that convex surrogate loss functions such as the hinge loss may lead to a wrong classification boundar...
متن کاملEstimation of Squared-Loss Mutual Information from Positive and Unlabeled Data
Capturing input-output dependency is an important task in statistical data analysis. Mutual information (MI) is a vital tool for this purpose, but it is known to be sensitive to outliers. To cope with this problem, a squared-loss variant of MI (SMI) was proposed, and its supervised estimator has been developed. On the other hand, in real-world classification problems, it is conceivable that onl...
متن کاملNonparametric semi-supervised learning of class proportions
The problem of developing binary classifiers from positive and unlabeled data is often encountered in machine learning. A common requirement in this setting is to approximate posterior probabilities of positive and negative classes for a previously unseen data point. This problem can be decomposed into two steps: (i) the development of accurate predictors that discriminate between positive and ...
متن کاملImproved Estimation of Class Prior Probabilities through Unlabeled Data
Work in the classification literature has shown that in computing a classification function, one need not know the class membership of all observations in the training set; the unlabeled observations still provide information on the marginal distribution of the feature set, and can thus contribute to increased classification accuracy for future observations. The present paper will show that thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEICE Transactions
دوره 97-D شماره
صفحات -
تاریخ انتشار 2014